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IDENTIFYING, PROCESSING AND plcx. It is desirable to avoid regeneration of the same 

CACHING OBJECT FRAGMENTS IN A WEB description repeatedly. Since Web pages, objects or docu- 

ENVIRONMENT mcnts on a common subject, or from the same company/ 

division/department or authors often have parts in common, 

FIELD or TIIE INVENTION 5 there is a need to go beyond recognizing just the repeated 

^ . . . , 1, . .t- t • c references to named entities (i.e., subject ah'eady has a 

The present invention relates generally to the analysis of iTDr\. u « V 1 . 

. • & / / name, e.g., URL) to subparts of named entities, 

the content of a digital document and m particular to the f « i- u .1 

A * »p f^^rJL^^t iA^^tuu^ »« However, proxy or Web servers and client browsers today 

creation and mamtcnanceoLpersistentiragment identities lo , / /.u 1 1 *j 

facilitate cachin r interpret the markup language to decompose a 

10 document or object into components, provide persistent 

BACKGROUND identities and tracking mechanisms to facilitate caching and 

, .c or ' recognition of repeated occurrences of components of a 

With the rapid growlh of the Iniernet, the need for efficient ^^^^^ ^^.^^ j^^^ ^^^^jy p^^y^ ^^^^^^ processing 

document exchange becomes increasingly important In ^^^^ ^^^^^^ ^ ^^^^^ example, as 

additional to the hypertext markup language (HTML), njentioned previously, in HTML the text documents and 

Extensible Markup Languages (XML) are becoming avail- (^j^j^j^ separated out from the text documents by 

able that provide a meta-language for authors to design their ^^^^^^^ ^j, ^^^^^ ^^j^^^^ ^ence cacheable 

own markup language. entities. Another problem is that if a document includes 

On the other hand, (he proliferation of various non-PC dynamic content caching is not meaningful as the next 

computing devices, including: handheld devices; palmtop reference to the same documcmURLcan result in a different 

devices; and various other Microsoft WINDOWS CE version of the document. Thus a document is not cached 

based devices; set-top boxes; WEB TV; smart phones; and ^yen if only a small fraction of its content is dynamic. This 

so-called Internet appliances, (hereinafter all referred to as an issue for HTML documents today and is expected to 

Internet appliances) further complicates the presentation of become more severe for XML documents, which are more 

a Web document to a client device. In a Web document based ^5 flexible and make it easier to incorporate various types of 

on HTML, images are treated as separate objects pointed lo dynamic information, such as data from a database, 

by the Web document. A proxy/Web server may generate a ^^^^^ ^^^^^^ ^ ^^^^^ ^^^^^^ 

lower resolution version or a black and white version of a identifying and creating one or more pcrsistem object frag- 

color image to accommodate the limited capability of the ^^^^^ f^^, ^^^^^ ^-^^^^ example lo facilitate caching. 

Internet appliance. Nonetheless, these images are named ^ p^^^j invention addresses this need, 
persistent objects (i.e., they have separate identities which 

are their URLs). The proxy or Web server is merely trying SUMMARY 

to provide different versions of a named entity based on the In accordance with the aforementioned needs, the present 

capability of a receiving device. This is independent of any invention is directed to a method and apparatus for identi- 

caching issues at the proxy or Web server to improve object 35 fying and creating persistent object fragments from a named 

access time. object. In one example, the present invention is directed to 

Various work exists to provide different versions of a a method and apparatus for dynamically parsing a digiul 

named object in the Web environment to support Internet content description of a named digital object, creating and 

appliances access to the Web. For example, PRISM from maintaining fragment identities to facilitate caching. 

Spyglass (sec e.g., http://www.spyglass.com) provides dif- 40 Examples of named digital objects include but are not 

ferenl versions of images to the Internet appliance. It can limited lo: Web pages described in XML, SGML, and 

also dynamically translate richly formatted Web documents HTML. 

into simplified Web pages to accommodate the requirements The present invention has features which can parse/ 

of the receiving devices. A means for performing on-demand analyze the object description, identify object fragments and 

data type-specific lossy compression on semantically typed 45 create persistent object fragment identities, and revise the 

data and tailoring content to the specific constraints of the object description by replacing each object fragment with its 

clients is described in "Adapting to Newark and Client newly created persistent identity and send the revised object 

Variability via On-Demaod Dynamic Distillation," by A. description to the requesting node. Depending upon the 

Fox, et al., Proc. 7lh Intl. Conference on Architectural propertiesof a fragment, this can either enable the fragment 

Support for Programming Languages and Operating 50 to be cacheable (which can be at the content/proxy server 

Systems, Oct. 1996. and the client device in the Web environment), or make the 

Using formal descriptors, such as a markup language, to revised object description cacheable al the server and client 

describe a digital document provides tremendous flexibility. device. For example, consider the object description of a 

In the Internet environment, more powerful markup Ian- purchase order which contains a dynamic part to retrieve the 

guagcs such as XML, or a subset of the Standard General- 55 current price of a product from the database. This dynamic 

ized Markup Language (SGML) (see e.g., ISO 8879/1986; part may be a small portion of the purchase order, but would 

and Designing XML Internet Applications, by M.Leventhal, prevent the object from being cached. According to one 

et al., Prentice Hall, 1998), are being defined to augment feature of the present invention for recognizing and treating 

HTML. The markup language description can provide rich tbe dynamic part as a separate fragment from the object 

information on the document structure and the final docu- 60 description, the revised document becomes static and Ihere- 

ment to be generated. In fact, XML is a language that allows fore cacheable. Furthcnmotc, fragments can be nested, 

users to define their own language. For example, chemists A method is also provided to determine which part/ 

can define a chemical markup language to describe a segment of a named object to recognize as a fragment 

molecular structure. Mathematicians or scientists can define identity, based on its properties, which can include its size, 

a math markup language to describe complex mathematical 65 processing cost to generate that segment of the object from 

formulas. The interpretation of the markup language its description, and other properties such as static vs. 

description and generation of the object can thus be com- dynamic. 
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S em 'object fragment identity for a persistent object « information can either reside in persistent storage (260) 

fraSen b^d on one or more of formal descnptors or an ^^^^,y (245). 

, 1- * ^\\^n\ receives the revised objeci descnp- where each segraeni ^JJU) iJ> enciuaw u^v i. 

,0 the client The d ent f «i«sl^^^ wheree g ^^^O). For example <cml 



"d^vt a'^t toj box. or .n Internet appliance. ft^^'w^ tt r r«l«> stfrt-tag. Thus 

BRIEF DESCRIFHON OF THE DRAWINGS P- ing the documem .0 
■mese and further, objects, advantages, and feamres of 35 J^. V'^Sy.f C«l pre<^ing "^tart-tag" of the 

the^^emion wiU b^ more apparent t-m the foUowing ',^S ;Wj^J leveUn markup Ungu^ 

derailed description of a preferred embodiment and the X^gmentcanhaveapTO(document,ypedeto^^^^^^ 

appended drawings wherein: describfthe semantics of the markup. U« an object 01 1^^^ 

HG. 1 is a diagram of an Internet environment having .^^^^^^^ , subset of the segmenU^^^^^ 

feaw«s of the present invention; . tained in a document and recogm^ them^^^^^^ 

FIG 2 is a more detailed example of a network envuon- ^ , Fragment creation ehg^.h^ 'S"frgmenl 

oicTha^ng features of the present invention; ^'"'^XtZ^^^^^^^^^^ ^ 

FIG. 3 depicts an example of a digital document using a s^uW be SiSr^"Snsidered. For each persis- 
markup language; 45 f*„f '°"!^^'r_^^ » persistent identity or name is 

no 4 depicts an example of a modified documenU "^L^tdSdVthat if the object fragmem appears 

FIG. 5 depicts the data structure of the fragment descnp- ?^^8n^p^^„;;,,^ ^„uiple Umes in the same object, .t 

tion table; , . will be recognized as the same fragment, 

no. 6 is an example of the server logic of FIG. 2, ^ ^ el.gibiluy 

HG. 7 is an example of the object request handler; a -^-^J ^/.^^^^^^^^ 

FIG. 8 is an example of the object parser; as to make '^iZ'^lf^S^^^^^^ « the client 

HG. 9 is an example of the next segment locator; or '^'^'^'"^^^Zf^^gn^ » dynamic segment as an 

no. 10 is an example of the persistent name creator; f^^^r S::^^^^^ 

HG. U is an example of the fragment request handler; ss bject^frag ^^^^ ^^^^'^^^l^i^^'^CZ 

HG. U is an example of the fragment cache manager, and ^ ^.^^.^ aeWce -ch as WNDO^^^^^^^^ b-^ 

the 

DETAILED DESCRIFHON , document and let the PJ^ -^V^X^ct 

HO. 1 depicts an example of an Internet —J r^S^-^SoJ Tc^t^ ^^^^^^^^^^ 

. ^Z^r£^^^ ?er:theD^of.efragme„toro.erI— ns. 
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«S nt^nce separated out-may need to ^ «ques^ (5 J^J,;^^ ^ ^^^^^ (,35)^ 3„d Fc^^Pj^^^C^^ 
c«,Tratfilv with additional requests from the cUent. 1 nus • persistent name of Ihe tragment. 1 as 

7m only a segment or group of segments thai m^t ,„ ""^^^'^^^^^^ ^^e irsistcnt natnc creator rouunc (w«h 
preteraoiy, ouiy • b ^csinE requirements of inter- name is givtu "J J, „ Fdcscriptwn (540) is the 

sidctalion is the additional storage "i^""^;"' '° ^'^'^g'^' „ description table entry (507). . 

rendered segment. Forexample consider wo ca^s^ In a hrs^ is P .^^ ^ ^^,^p,, ,t uie server log'c (240M" 

case, the processing time is 100 second of CPU Ume to 1 1 J ;^ Dependmg upon he type 

eenerate the segment from the description, and the size of step Ot», in ^e invoked. If at step 

Kdcrcd segment is lOK bytes. In a second c^. too f '^^l'.^^^ZbiCCi request, the object request handler 

nroce^inc time is 1 second of CPU time to generate the '^^^^^^ reference 

LgmeTfrom the description, and the size of the rende^ « invoked m s.ep615 ^ 

relmcnt is 1000 K bytes. In case 1. the savings on CPU U^^^ to f^^J^'^l ,^%ie. in a Web envffonrnent 

L substantial whQe the additional storage cost is mimmal^ f » "'e^'nt ^ ^ basis thai an obj«:t 

mopP<^ite is true for the second case. In other words, only ^""bject request c n ^ ^^^^ ^ 

,J case is i. worthwhile to re^gn ize t e segmen as me will ^^^^^^^ slh^ ^ . j 
a separate fragment for "^hmg^ In '"e P'"^". 25 ^^^^jetails described with reference to FIG. U) is invoke^^ 

r^ra£gme^r«^^^^ SrSe^t^^STi^r^-^^^^^^^ 
':dlition.s.4re.^^^^^^^^^^ Sgo^Ke curre^at invention and thus will not be 

~"reqS -nf aeLmine tl^Jue of 30 <'-f,^,^;£;„ ,,,.p, of the object request hanc^^^^ 

Agnizing a fragment. An Wlri^rfhv^ «uS In s ep 705 it's fi«t checked whether the requested object 

wiU be processing cost (in seconds d-vj^ Ks is cthi b he object cache maintained by this compuUi« 

toot of the additional storage ' teif a gK^n node Itlhe object is cached, in step 710. the cached object 

iLcule> and the -cond segment be^ns with a^^^^^ 40 8) - .-oked t^ ^.^^^ ''^^^P'^riiSkS 

n^^e^ wXt S t^ritgment be^r« with a start^ag, -^hS rnVSct'^ deU J (wW^^ may 

"db: price>, and finishes with an '"d-Ug </*. pnce>. «° ;;^*=J^;^„dified by ihe object parser) should be cached 

Assuine the semantics of the three moK tatL object cache. The object cache manager is siradar to 

^me the first segment provides an unage "fa molecule m ^ „^.ge, ftat caches the Web 

stf^ture of a chemical compound. A.ssume ako the se^nd a conven^on ^^^^ „,anagement policy such as 

seanent contains a formula to generate an order table "^J'J^'^y ^„i, u^^d), or its variants lo take into 

Eng trprice at different quantities. Assume further, the ^^^^l^^Zdcotti between object size, update 

torse'gment retrieves the price 7„^? ^^^P^J ^ and since last reference (i.c the reference 

L database. Hence it is a segmem with dynamic informa M«ency. 

'-^O 4 depicts an example of a modified Web doc^ent '^^-^^-^^^^^^^^^ S^Sy incorpo^ 

aft^r the persistent fragments have been rccogn.«d^,f^ SV'^krc.c i. its entirety, whereui varjants of LRU 

extracted Here it is assumed that generating the molecula 55 raic y ^eb ob eels are described 

sS« of the chemical compound in the first segment "^J^nB^^" ,h, object parser depicted m 

So quite complex, whereas the computauon of Oie ™. 8 dep cts n ^ 

orde ruble is straightforward. Hence, only the fl«« W "^a^.r^,ack" and a "scgment_5tack" -during its 

the third segments (330,') are recogn^ed as f ^^^^^ p^^g o i^n'ify P'''^^^"' ^"^ments. The Ug^adc 
fcagments with the identities. "125,1" and 28-3 r^pec 60 P^^ „„,rt.tag"rscanned. but whose matchmg end- 

t^Jd^ In the preferred embodiment, each of the pei«stent ^cf«* ™ be^„ encountered during scannmg of ihe 

Stents is replaced with an ''-^^d^t— ^ S 'io"' ^« -^ment.stack includes segmenU 

,0 the name of the fragment, e.g. <mclude HREF-^ ^-1 > W-J^^ ye^d as fragments, but have Ae 

indicating the reference to the fragment 125.1. and re g ^ ^^^.^^^ t^o 

lowed by a <include> statement. Lntlytoformafragmem.AsdeplCted,lnstep805.toet^TO 

'°nG. 5 depicts an example of a fragmem^^^^^^^ Sl'areinitiaUzedt. null. In step 810. a variable, txt, is set 
for uacking the object fragment identity and its descnpuon. 
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equal to the object description. In step 820, a next segment the fragment description list of the corresponding entry in 

locator is invoked (with details described with reference to the fragment description table. In step 1035, the persistent 

no. 9) to identify the next segment, Nscgracnt, in txt, In name created is returned. 

step S2S, it is checked if Nsegment is null. If so, the FIG. 11 depicts an example of the fragmem request 

processing of Ixi is completed. Otherwise, in step 830, it will 5 handler (FIG. 6, step 625). As depicted, in step 1105, it is 

delete segments in the segmcnt_stack that are included in determined which version of ihe fragment needs to be 

Nsegment, if any. In step 835, it is checked whether Nscg- generated and returned to the requesting client, if multiple 

meat satisfies the fragment creation eligibility criterion. If versions are available. A degenerate case is that only one 

so, in step 840 a persistent name creator routine (with details version is available e.g., a proxy server only has code to 

depicted in FIG. 10) is invoked to create a persistent jo generate one version of a fragment. In step 1110, it is 

fragment identity for Ihe segment. In step 845, the txt is checked whether the requested version is cached in the 

modified to replace the fragment description with an fragment cache. If so, in step 1150, the requested version is 

<include> statement to reference the persistent fragment rcturned to the requesting node. In step 1160, the fragment 

name followed by an <include> as described in FIG. 4. In "^^^^ '"^"^gf/ "P^^'^^ f ^""l"^^ 

oee .u ♦ ««~,w:«^j «r:tk -.c «^ «^^«t ferred embodiment, an LRU cache management policy is 

step 855. Ihe Nsegment b combined wth «s adjacent peer ,5 ^ ^6 P 

segmems 00 me segment_stacK u any. wnere a peer seg- ^RU chata. In step 1120. for the case where the 
ment is a segment at the same level (i.e., with he same ^ ^^^^^^ ^^^^^^ 
parent) of the Nsegmen in a nested markup language description from the fragment description table. In step 
dcscnptiOD. In step 860, « is checked if the combined ii25, the fragment is generated based on the fragment 
segment satisfies the fragment creation eligibility cntcnon. 20 description and the client requiremem. In the preferred 
If so, if step 865, these adjacent peer segments arc removed embodiment, each type of markup language describing the 
from the scgmcnt^stack. Otherwise, in step 870, the Nseg- fragment can have its own DTD to provide its semantic. For 
ment is added to segment__stack. each type of DTD, there can be different ways of generating/ 
FIG. 9 depicts amore detailed example of the next scg- rendering the fragment based on the characteristics of the 
ment locator (FIG. 8, step 820). As depicted, in step 910, it 25 requesting devices, such as processing power, storage 
is checked if the next token is null, where a token is a capacity, and communication bandwidth. This can be 
consecutive string of characters delimited between blanks described in a GTD (Generator Tabic Definition) on how to 
(or some other delimiters defined by the markup language). generate a different version for a given DTD to satisfy the 
If so, in step 915, the Nsegment is set to null. Oiherwfee, in requirement of a specific receiving device. The GTD is 
step 920, it is checked if the next token is a "start-tag" type 30 separate from the DTD. It can be provided by a third party 
token. If so, the token is inserted into the lag_stack with an such as the Internet appliance manufacturer or other soft- 
associated "token position value" set to its starling position ware manufacturer. In step 1135, the request fragment 
in the txt variable. In step 930, it is checked if the next token version is returned to the requester. In step 1140, the 
is an "end-tag" type token. If so, in step 940, the Nsegment fragment cache manager (with details described with refer- 
is set to the substring in txt starting from the token position 35 ence to FIG. 12) is invoked. 

value indicated by the top element of the tagi3 slack to the FIG. 12 depicts an example of the fragment cache man- 

"end-tag" token. In step 945, the top element in the tagj3 ager. In the preferred embodiment, the fragment cache 

stack is removed. manager uses an LRU type replacement policy. As depicted, 

FIG. 10 depicts a more detailed example of the persistent in step 1205, it is checked whether there is enough free space 

name creator (FIG. 8, step 840). As depicted, in step 1005, 40 in the fragment cache to cache the requested fragment (0^), 

the fragment description is obtained from txt. In step 1010, If so, fragment 0^ is cached in the fragment cache. Other- 

thc fragment description is mapped into a number which wise in step 1215, it determines the minimum k value such 

corresponds to an entry of the fragment description table. that the bottom k fragments, O^^ in LRU stack of the 

Those skilled in the art will appreciate that there are many fragment cache will have a total size larger than that of 

alternative mapping functions. For example, this can be 45 fragment O^. Ind step 1220, it is checked based on the value 

done by performing an exclusive— or of all the characters in function (Q whether it is more desirable to cache or 

the fragment description and then treating the result as an {0«^ . . . , 0^,^ }. The total processing cost to generate 

integer to divide it by the number of entries in the fragment {Ot^^ . . . , Ot^}is the sura of the processing cost of each 0^,-, 

description tabic. The remainder will serve as the index to l<i<i, and the additional storage requirement to store 

the fragment description table. In step 1020, it is checked if 50 {0« . . . , 0^,;^} is the sum of the size of each O^,, l<i<k. If 

the segment description already appeared in the fragment 0^ is more valuable with a large F function value, in step 

description Ust of the said entry in the fragment description 1225, {O , . . , 0^* } is deleted to make room to cache O^. 

table. If so, in step 1040, the fragment name of the matching In step 1230, the reference statistics for the fragment vereion 

fragment description will be returned. Otherwise, in step is updated for the fragment cache manager to manage its 

1025, a new persistent name is created for the fragment. 55 LRU cache. 

There are many ways to create a unique name for the To facilitate garbage collection of fragment descriptions 

fragment. One way is to maintain a counter for each entry of that are no longer in use, an object-fragment table can be 

the fragment description table to track the number of distinct maintained which tracks the fragment created for each 

fragment descriptions that have been mapped to this entry. object and an fragment-object table to track all objects 

The name given to the new fragment will be the value of its 60 containing a common fragment. After an object is updated, 

entry to the fragment description table augmented with the on its next reference, the object parser may detect that the 

current value of the counter associated with the said entry. object now contains some new fragments and some frag- 

For example, if a fragment description is mapped to the 26th ments previously contained in the object are no longer in it. 

entry of the fragment description table and there already It will then check for each fragment no longer in use by the 

have 5 distinct fragments previously mapped io this entry, 65 object whether there is any other object containing it based 

the persistent name for the new fragment will be "26.6". In on the fragment-object table. If so, the fragment description 

step 1030, the fragment name and its description is added to element in FIG. 5 will be deleted from the fragment descrip- 



